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In the rapidly advancing technology landscape, data governance is an area which has undergone sub- 
stantial evolution. This white paper journeys from the foundational principles of traditional data governance, 
dubbed ‘Data Governance 1.0; to the intricacies introduced by generative Al. Our exploration offers a 
comprehensive overview of the legacy governance framework and delves into the contemporary challeng- 
es and considerations posed by the Al-driven landscape. 


Readers of this Executive Summary 
will gain insight into both the historical 
context and the pressing implications of 
today’s ‘need to know’ Al-centric data 


governance 3.0. 


Data Governance 1.0 


Data Governance 1.0 provides the basic framework for a typical Data Governance Program. This white pa- 
per first illustrates Data Governance (1.0) before the promulgation of Generative Al and Machine Learning. 
This section of our whitepaper establishes foundational frameworks for a robust Data Governance Program. 
We will then segue from this foundation to an enhanced Data Governance Framework 3.0 aligning with 
Web 3.0 covering generative Al and Machine Learning. We are skipping Data Governance 2.0 intention- 
ally; thus, you can assume this section covers all critical aspects of Data Governance up to the Generative Al 
and Machine Learning promulgation. 


Web 3.0 is much broader than Generative Al encapsulating decentralized finance (DeFi) tenants such as 
distributed ledger technology and crypto currencies. However, Al and Semantic Web 3.0 will be heavily 
used and likely the underpinning to this movement (1). 


Put simply by Robert Seiner in his book “Non-Invasive Data Governance — The Path of Least Resistance and 
Greatest Success”, “Data Governance is the formal execution and enforcement of authority over manage- 
ment of data and data related issues” (2). Said differently, Data Governance is the actions that enables data 
users to know what data is key to success, what good looks like for the data (data quality rules), when good 
is achieved and when it isn’t (data defects or data quality errors, data breaches, etc.), and when defects or 
breeches are remediated (data quality dashboards, workflows, escalations etc.). 


The purpose of Data Governance is to ensure reliability and consistency across the organization. Proper 


Governance Programs typically include Policies, Processes, Organizational Design, Data Architecture, and 
Technology (3). 
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Policies 


Policies are the glue to starting and maintaining an effective Data 
Governance Program. These policies should cover data ownership, 
data sourcing, including definition of trusted sources, data quality 
and data defect management/escalations, data usage including 
data privacy, retention, and data security. Data sourcing is of 
particular importance both in Data Governance 1.0 and Data 
Governance 3.0. In the Data Governance 1.0 framework data 
sourcing should be defined as follows: 


* Data is sourced from a trusted source or a system of record 


* Data is not sourced from a trusted source or system of record 


* Risk acceptance for data not sourced from a trusted source 
or system of record 

* Waiver review process and extension provisions for risk 
accepted data sources 

* Data defect assigned for remediation to be tracked and 
managed through completion for sourced data from 
untrusted sources 


lati Bifor risk da dll *Note this list isn’t all inclusive and thus policies should 
* Escalation path for risk accepted data sources and data evolve with your organization’s business activities 


and data maturity. 


defects 


Processes 


Data are not “natural substances. They don’t exist in the wild. Data are created (4).” Data 
processes from ingress where the data is collected throughout the journey of storage, trans- 
formation, and consumption should be understood, documented in the form of metadata 
minimally for the critical data elements for your firm. The metadata should be stored in a 
proper Master Data Management (MDM) tool. Further, this documentation should include 
control points such as data movement controls (DMC), data reconciliation, and data pri- 
vacy controls such as identification of exposure to sensitive data. The process should also 


include deficiencies when controls are emerging or not yet developed. Documenting con- 
trol deficiencies may appear counter intuitive as these gaps now become discoverable by 
concerned authorities such as internal audit, external auditors, and regulatory examiners; 
however, this self-reporting provides a strong sense of awareness, accountability and illus- 
trates corrective action to be taken. 


A firm's MDM can be as sophisticated as an industry grade commercially provided platform 
such as those provided by an Informatica, Collibra, and a myriad of their competitors or it 
can simply be stored in a SharePoint site. The key to proper MDM is that metadata is easily 
retrievable and consistently updated to remain current with the changing data landscape. 
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Organizational Design 


“Organization Design is a process for shaping the way your organization operates, to help you to pursue 
your strategies and meet your goals. It involves setting up structures and systems, as well as helping people 
to adapt to new ways of working. (5)” These “structures and systems” are either data or powered by data. 


Great organizations also aren't natural substances; rather, they are a function of great people, a great cul- 
ture, a great mission and almost always a superior organizational design. The design of the organization 
provides a blueprint for organizational responsibilities. The corporate world is constantly changing either 
due to growth, contraction, regulation, or new products. This constant change can create disruption even at 
the best organized companies and even more so when data is a foundational layer for companies. 


Having a well-designed data organization will play a key role in minimizing disruption from the daily grind 
of change, better yet, it will best position your company both defensively (data governance, data quality, 
etc.) and offensively (data analytics, data science, revenue generation). 


Data Architecture 


A well-controlled and leveraged data architecture 
typically has the following components: 


* Aligns with overall IT and business architecture 

* Is well understood by key partners designing new 
data flows and maintaining existing data pipes 

* Is governed to ensure compliance with the agreed 
upon architecture 

¢ Has a process to allow exceptions when data flows 


depart from the agreed upon and governed data 


architecture 


A taxonomy of data hierarchy is also good practice to keep the aforementioned principles in check and to 
properly illustrate data ownership and alignment. Wherever possible the platforms, pipes, warehouses, and 
other technologies in use should be clearly spelled out in the process to avoid confusion. For example, if a 
particular DataMart is deemed an official domain for a line of business this point should be documented. 

In the event a data producer or consumer wants to break glass on how data is architecturally prescribed a 
governing body would be consulted to either approve the break glass exception or decline. When break 
glass exceptions are granted, these situations should be well documented including how long the exception 
will occur, when the exception will be reviewed, and when the exception will be remediated. 
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Technology 


Data technology governance can cover many aspects of IT including data engineering, enterprise data 
products, enterprise data warehouse, systems of record, Master Data Management, and user developed 
tools such as excel or access databases. Data technology tools should be documented as the primary 
mechanism or standard for the data journey. If data is used by nonstandard technology these instances 
should be documented with waivers. These waivers should include how long the exemption lasts, what 
the remediation is (typically migrating to the standard), what the waver extension process is, and how the 
exceptions are governed. Governance documentation should include approval and escalation paths. 


Additionally, as new products or regulatory changes are introduced into a tech stack, data practitioners 
should be aligned with these changes. In other words, data should be part of the SDLC. Including data in 
the SDLC or enterprise change management process will keep data usage documented and disciplined. 


Data Governance 3.0 Generative 
Al and Machine Learning 


Data Governance 3.0 Generative Al 
builds on all key components of the Data 
Governance 1.0 frmework, meaning all 


of 1.0 applies to 3.0. 


The foundational principles and core concepts of data governance remain relevant and applicable. As Bob 
Seiner writes in his new book, “Non-Invasive Data Governance Strikes Again: Gaining Experience and 
Perspective,” “the data challenges presented by LLMs are consistent across all approaches to data gover- 
nance.” However, in this more intricate context, there might be subtle to pronounced shifts in its application 
permeating through key facets previously touched upon: policies, processes, organizational design, data 
architecture, and technology. 


One significant focus is on algorithmic transparency, highlighting the importance of grasping and clarifying 
Al's decision-making mechanisms and ensuring the integrity and validation of the foundational data sourc- 
es. Along with transparency, ethics takes center stage, prompting the introduction of roles like Al ethicists. 
Additionally, data architecture and technology must evolve to be more flexible and responsive, catering 

to Al’s rapid adaptability. In the age of generative Al, while data governance provides the foundational 
framework for data’s quality and security, a deeper dive into data’s origin and journey is essential. This is 
where data provenance comes to the forefront, offering transparency and accountability, ensuring both 
trustworthiness and adaptability in our Al-driven landscape. 
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Data sourcing and provenance, although not new concepts, have amplified significance when navigating 
the intricacies of contemporary Al. Specifically, data sourcing entails pinpointing, securing, and assimilating 
data suited to set aims. Meanwhile, data provenance chronicles the lifecycle of data, capturing its genesis, 
alterations, and ultimate usage. 


In the Al landscape, the credibility of our models is intimately tied to the data that informs them, emphasiz- 
ing the need to ascertain its origin and trace its modifications. Proper documentation of data provenance is 
indispensable, as it not only validates replicable research and analysis but also assures the dependability 
of Al outcomes. Furthermore, with an increasingly stringent regulatory environment, a thorough grasp of a 
dataset’s origins and history is imperative, ensuring compliance and upholding organizational accountabili- 
ty for Al-derived decisions. 


The significance of data provenance is heightened, especially when evaluated in tandem with data sourc- 
ing, lineage, and governance. Governance crafts the blueprint for data management, accessibility, and us- 
age. Within this blueprint, provenance chronicles the genesis and trajectory of data. Together with sourcing 
(the acquisition of data) and lineage (its metamorphic history), provenance offers an exhaustive perspective 
on the data’s life cycle. This holistic view ensures transparency, precision, and adherence to regulations—es- 
sentials when considering generative Al and the foundational data upon which these models are trained. 


Digital watermarks 


A good portion of Generative Al data will be sourced outside of ex- 
isting trusted sources or system of records. This sourcing could simply 
be using base data from trusted sources and leveraging generative 
Al models to createing advanced data elements. Another scenario 
by be generating data from external data sources including text and 
images on topic of traditional market or 3rd party data. 


Data created using generative Al should include a watermark that 
refersring back to the source. This source data can then be risk 
reviewed based on a risk-approach to determine if the sources is 
trustworthy. If the data is derived from a trusted source, then the 
data can be deemed high quality and the source low or no risk. If 
the data source isn’t trusted, then the appropriate risk rating on the 
data source should be applied. The risk appetite of each individual 
firm coupled with the sensitivity of the data output should be consid- 
ered when deciding whether to use Generative Al data sources that 
have an untrusted origin. All this lineage and provenance should be 
stored in a metadata catalogue and certified on a recurring basis at 
appropriate intervals deemed suitable for the specific and evaluated 
usage and risk assessment. 


Tagging generative Al data with a digital watermark is one way to 
provide awareness of the level of trust in the data source journey and 
corresponding data consumption risk. Digital watermarks or similar 
tagging should be a key component of Data Governance 3.0 for 
Generative Al data. 
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Conclusion 


Strong Generative Al and Machine Learning is predicated on having a strongly conceptualizedwell thought 
out and documented data governance framework — Data Governance 1.0, and then building upon this 
framework for Data Governance 3.0, focusing acutely on Generative Al and Machine Learning data 
sourcing. Updates to the existing and traditional data governance framework such as Digital Watermarks 
will become the industry standard for tagging generated data. Organizations must remain vigilant, updat- 
ing their data governance frameworks to ensure they are resilient and efficient amidst the ever-evolving 
landscape of Al. Curate Insights has significant experience implementing these models and is ready to assist 
your Generative Al and Machine Learning Data Governance 3.0 implementation. 


About Curate Insights 


We Are Data Connectors 


Our unique ability to deliver exceptional, action-oriented results is predicated on the 
people we have providing them. We are passionate about analytics, and we are driv- 
en to make the world a better place through a more informed understanding and con- 
nection of data.connect customers across household and other types of relationships. 


Our Mission 

At Curate Insights, we enable our clients to collect, analyze, and execute 
data and analytic solutions that lead to data-driven analytic intelligence that 
positively impacts their business. 


Our Vision 

To be the premier data and analytic consulting firm. To enable real business outcomes 
through the use, organization, and governance of data and analytics. To deliver mis- 
sion-driven solutions designed to unlock the power of data and analytics to enrich the 
lives of our people, our clients, and the communities we serve. 


Problem Solvers 

To do this, we curate analytic intelligence by connecting business goals to 
technology investments. We build trust and transparency into the data that is 
most critical to grow and optimize your organization. 


What we deliver are solutions and insights built to transform business strategy. 
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